Predecir el consumo para los 3 meses siguiente dada una serie de
datos del consumo previo junto a variables exogenas. Las variables
next_consume, next_2_consume y
next_3_consume son las variables dependientes que queremos
predecir.
install.packages("readr")
Error in install.packages : Updating loaded packages
install.packages("ranger")
Error in install.packages : Updating loaded packages
install.packages("dplyr")
Error in install.packages : Updating loaded packages
install.packages("skimr")
Error in install.packages : Updating loaded packages
install.packages("caret")
Error in install.packages : Updating loaded packages
library(readr) # para leeer el dataset
library(ranger) # random forest con esteroides
library(dplyr) # para manipular datos
library(skimr) # para mirar los datos
library(caret) # framework de machine learning
# dataset %>% select(Date)
# dataset %>% names() %>% as.data.frame()
skimr::skim(dataset)# %>% knitr::kable() %>% kable_styling(font_size = 9)
── Data Summary ────────────────────────
Values
Name dataset
Number of rows 532
Number of columns 648
_______________________
Column type frequency:
character 1
numeric 647
________________________
Group variables None
dataset <- dataset %>% tidyr::drop_na()
# mirando los mismos datos, predicen parecido para los proximos 3 meses.
train<-dataset %>% sample_frac(0.8)
test <-setdiff(dataset,train)
train
test
rf_model1 <- ranger(tmg ~ . ,data=train)
rf_model1
Ranger result
Call:
ranger(tmg ~ ., data = train)
Type: Regression
Number of trees: 500
Sample size: 426
Number of independent variables: 647
Mtry: 25
Target node size: 5
Variable importance mode: none
Splitrule: variance
OOB prediction error (MSE): 0.001175333
R squared (OOB): 0.8174207
rf_model1$prediction.error
[1] 0.001175333
Los errores MSE y R squared se calculan sobre el OOB. El concepto de
OOB está relacionado con el proceso de
bootstrapping, que es una técnica de muestreo utilizada
en la construcción de los árboles de decisión en Random Forest. En
bootstrapping, se extrae una muestra aleatoria de los datos de
entrenamiento con reemplazo, lo que significa que algunas instancias
pueden ser elegidas varias veces, mientras que otras pueden no ser
elegidas en absoluto.
rf_model1 <- ranger(tmg ~ . ,data=train, importance = "impurity")
rf_model1
Ranger result
Call:
ranger(tmg ~ ., data = train, importance = "impurity")
Type: Regression
Number of trees: 500
Sample size: 426
Number of independent variables: 647
Mtry: 25
Target node size: 5
Variable importance mode: impurity
Splitrule: variance
OOB prediction error (MSE): 0.001174988
R squared (OOB): 0.8174742
impurity: Este es el método predeterminado, que calcula la importancia de una característica basándose en la disminución de la impureza del nodo (por ejemplo, Gini o entropía) cuando una característica se utiliza para dividir en los árboles de decisión. Cuanto mayor sea la disminución de la impureza, más importante se considera la característica.
rf_model1$variable.importance
name psd_1 psd_2 psd_3 psd_4 psd_5 psd_6 psd_7 psd_8 psd_9 psd_10 psd_11
0.0261748227 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
psd_12 psd_13 psd_14 psd_15 psd_16 psd_17 psd_18 psd_19 psd_20 psd_21 psd_22 psd_23
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
psd_24 psd_25 psd_26 psd_27 psd_28 psd_29 psd_30 psd_31 psd_32 psd_33 psd_34 psd_35
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
psd_36 psd_37 psd_38 psd_39 psd_40 psd_41 psd_42 psd_43 psd_44 psd_45 psd_46 psd_47
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
psd_48 psd_49 psd_50 psd_51 psd_52 psd_53 psd_54 psd_55 psd_56 psd_57 psd_58 psd_59
0.0000000000 0.0000000000 0.0213483469 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0094678994 0.0000000000
psd_60 psd11_1 psd12_1 psd22_1 psd11_2 psd12_2 psd22_2 psd11_3 psd12_3 psd22_3 psd11_4 psd12_4
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
psd22_4 psd11_5 psd12_5 psd22_5 psd11_6 psd12_6 psd22_6 psd11_7 psd12_7 psd22_7 psd11_8 psd12_8
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
psd22_8 psd11_9 psd12_9 psd22_9 psd11_10 psd12_10 psd22_10 psd11_11 psd12_11 psd22_11 psd11_12 psd12_12
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
psd22_12 psd11_13 psd12_13 psd22_13 psd11_14 psd12_14 psd22_14 psd11_15 psd12_15 psd22_15 psd11_16 psd12_16
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
psd22_16 psd11_17 psd12_17 psd22_17 psd11_18 psd12_18 psd22_18 psd11_19 psd12_19 psd22_19 psd11_20 psd12_20
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
psd22_20 psd11_21 psd12_21 psd22_21 psd11_22 psd12_22 psd22_22 psd11_23 psd12_23 psd22_23 psd11_24 psd12_24
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
psd22_24 psd11_25 psd12_25 psd22_25 psd11_26 psd12_26 psd22_26 psd11_27 psd12_27 psd22_27 psd11_28 psd12_28
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
psd22_28 psd11_29 psd12_29 psd22_29 psd11_30 psd12_30 psd22_30 psd11_31 psd12_31 psd22_31 psd11_32 psd12_32
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
psd22_32 psd11_33 psd12_33 psd22_33 psd11_34 psd12_34 psd22_34 psd11_35 psd12_35 psd22_35 psd11_36 psd12_36
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
psd22_36 psd11_37 psd12_37 psd22_37 psd11_38 psd12_38 psd22_38 psd11_39 psd12_39 psd22_39 psd11_40 psd12_40
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
psd22_40 psd11_41 psd12_41 psd22_41 psd11_42 psd12_42 psd22_42 psd11_43 psd12_43 psd22_43 psd11_44 psd12_44
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
psd22_44 psd11_45 psd12_45 psd22_45 psd11_46 psd12_46 psd22_46 psd11_47 psd12_47 psd22_47 psd11_48 psd12_48
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
psd22_48 psd11_49 psd12_49 psd22_49 psd11_50 psd12_50 psd22_50 psd11_51 psd12_51 psd22_51 psd11_52 psd12_52
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0690166942 0.0721274150 0.0576661055 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
psd22_52 psd11_53 psd12_53 psd22_53 psd11_54 psd12_54 psd22_54 psd11_55 psd12_55 psd22_55 psd11_56 psd12_56
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
psd22_56 psd11_57 psd12_57 psd22_57 psd11_58 psd12_58 psd22_58 psd11_59 psd12_59 psd22_59 psd11_60 psd12_60
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0707740774 0.0774427979 0.0537484588 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
psd22_60 coordc_1 coordc_2 coordc_3 coordc_4 coordc_5 coordc_6 coordc_7 coordc_8 coordc_9 coordc_10 coordc_11
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0025208550
coordc_12 coordc_13 coordc_14 coordc_15 coordc_16 coordc_17 coordc_18 coordc_19 coordc_20 coordc_21 coordc_22 coordc_23
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0071077313 0.0000000000 0.0000000000
coordc_24 coordc_25 coordc_26 coordc_27 coordc_28 coordc_29 coordc_30 coordc_31 coordc_32 coordc_33 coordc_34 coordc_35
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
coordc_36 coordc_37 coordc_38 coordc_39 coordc_40 coordc_41 coordc_42 coordc_43 coordc_44 coordc_45 coordc_46 coordc_47
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0088438379 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
coordc_48 coordc_49 coordc_50 coordc_51 coordc_52 coordc_53 coordc_54 coordc_55 coordc_56 coordc_57 coordc_58 coordc_59
0.0000000000 0.0000000000 0.0000000000 0.0144601102 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
coordc_60 coordc_61 coordc_62 coordc_63 coordc_64 coordc_65 coordc_66 coordc_67 coordc_68 coordc_69 coordc_70 coordc_71
0.0000000000 0.0168602614 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0066789920
coordc_72 coordc_73 coordc_74 coordc_75 coordc_76 coordc_77 coordc_78 coordc_79 coordc_80 coordc_81 coordc_82 coordc_83
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0211646435 0.0000000000 0.0000000000
coordc_84 coordc_85 coordc_86 coordc_87 coordc_88 coordc_89 coordc_90 coordc_91 coordc_92 coordc_93 coordc_94 coordc_95
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
coordc_96 coordc_97 coordc_98 coordc_99 coordc_100 coordc_fe_1 coordc_fe_2 coordc_fe_3 coordc_fe_4 coordc_fe_5 coordc_fe_6 coordc_fe_7
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
coordc_fe_8 coordc_fe_9 coordc_fe_10 coordc_fe_11 coordc_fe_12 coordc_fe_13 coordc_fe_14 coordc_fe_15 coordc_fe_16 coordc_fe_17 coordc_fe_18 coordc_fe_19
0.0000000000 0.0000000000 0.0000000000 0.0044248989 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
coordc_fe_20 coordc_fe_21 coordc_fe_22 coordc_fe_23 coordc_fe_24 coordc_fe_25 coordc_fe_26 coordc_fe_27 coordc_fe_28 coordc_fe_29 coordc_fe_30 coordc_fe_31
0.0000000000 0.0151897063 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
coordc_fe_32 coordc_fe_33 coordc_fe_34 coordc_fe_35 coordc_fe_36 coordc_fe_37 coordc_fe_38 coordc_fe_39 coordc_fe_40 coordc_fe_41 coordc_fe_42 coordc_fe_43
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0184656877 0.0000000000 0.0000000000
coordc_fe_44 coordc_fe_45 coordc_fe_46 coordc_fe_47 coordc_fe_48 coordc_fe_49 coordc_fe_50 coordc_fe_51 coordc_fe_52 coordc_fe_53 coordc_fe_54 coordc_fe_55
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0145968020 0.0000000000 0.0000000000 0.0000000000 0.0000000000
coordc_fe_56 coordc_fe_57 coordc_fe_58 coordc_fe_59 coordc_fe_60 coordc_fe_61 coordc_fe_62 coordc_fe_63 coordc_fe_64 coordc_fe_65 coordc_fe_66 coordc_fe_67
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0118686704 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
coordc_fe_68 coordc_fe_69 coordc_fe_70 coordc_fe_71 coordc_fe_72 coordc_fe_73 coordc_fe_74 coordc_fe_75 coordc_fe_76 coordc_fe_77 coordc_fe_78 coordc_fe_79
0.0000000000 0.0000000000 0.0000000000 0.0174568488 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
coordc_fe_80 coordc_fe_81 coordc_fe_82 coordc_fe_83 coordc_fe_84 coordc_fe_85 coordc_fe_86 coordc_fe_87 coordc_fe_88 coordc_fe_89 coordc_fe_90 coordc_fe_91
0.0000000000 0.0434497086 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
coordc_fe_92 coordc_fe_93 coordc_fe_94 coordc_fe_95 coordc_fe_96 coordc_fe_97 coordc_fe_98 coordc_fe_99 coordc_fe_100 coordc_ni_1 coordc_ni_2 coordc_ni_3
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
coordc_ni_4 coordc_ni_5 coordc_ni_6 coordc_ni_7 coordc_ni_8 coordc_ni_9 coordc_ni_10 coordc_ni_11 coordc_ni_12 coordc_ni_13 coordc_ni_14 coordc_ni_15
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0032470147 0.0000000000 0.0000000000 0.0000000000 0.0000000000
coordc_ni_16 coordc_ni_17 coordc_ni_18 coordc_ni_19 coordc_ni_20 coordc_ni_21 coordc_ni_22 coordc_ni_23 coordc_ni_24 coordc_ni_25 coordc_ni_26 coordc_ni_27
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0128599356 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
coordc_ni_28 coordc_ni_29 coordc_ni_30 coordc_ni_31 coordc_ni_32 coordc_ni_33 coordc_ni_34 coordc_ni_35 coordc_ni_36 coordc_ni_37 coordc_ni_38 coordc_ni_39
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
coordc_ni_40 coordc_ni_41 coordc_ni_42 coordc_ni_43 coordc_ni_44 coordc_ni_45 coordc_ni_46 coordc_ni_47 coordc_ni_48 coordc_ni_49 coordc_ni_50 coordc_ni_51
0.0000000000 0.0169897201 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0098439979
coordc_ni_52 coordc_ni_53 coordc_ni_54 coordc_ni_55 coordc_ni_56 coordc_ni_57 coordc_ni_58 coordc_ni_59 coordc_ni_60 coordc_ni_61 coordc_ni_62 coordc_ni_63
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0169251622 0.0000000000 0.0000000000
coordc_ni_64 coordc_ni_65 coordc_ni_66 coordc_ni_67 coordc_ni_68 coordc_ni_69 coordc_ni_70 coordc_ni_71 coordc_ni_72 coordc_ni_73 coordc_ni_74 coordc_ni_75
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0120012339 0.0000000000 0.0000000000 0.0000000000 0.0000000000
coordc_ni_76 coordc_ni_77 coordc_ni_78 coordc_ni_79 coordc_ni_80 coordc_ni_81 coordc_ni_82 coordc_ni_83 coordc_ni_84 coordc_ni_85 coordc_ni_86 coordc_ni_87
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0286253719 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
coordc_ni_88 coordc_ni_89 coordc_ni_90 coordc_ni_91 coordc_ni_92 coordc_ni_93 coordc_ni_94 coordc_ni_95 coordc_ni_96 coordc_ni_97 coordc_ni_98 coordc_ni_99
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
coordc_ni_100 pec_1 pec_2 pec_3 pec_4 pec_5 pec_6 pec_7 pec_8 pec_9 pec_10 pec_11
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
pec_12 pec_13 pec_14 pec_15 pec_16 pec_17 pec_18 pec_19 pec_20 pec_21 pec_22 pec_23
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0031291087 0.0227018934 0.0321348274 0.0285505033 0.0265709627
pec_24 pec_25 pec_26 pec_27 pec_28 pec_29 pec_30 pec_31 pec_32 pec_33 pec_34 pec_35
0.0068347184 0.0583014758 0.0128612956 0.0995415681 0.1849249501 0.0640673810 0.0188584171 0.0626721214 0.0837703340 0.0141363145 0.0104518916 0.0114475262
pec_36 pec_37 pec_38 pec_39 pec_40 pec_41 pec_42 pec_43 pec_44 pec_45 pec_46 pec_47
0.0161978632 0.0202804911 0.0173945538 0.0267245812 0.0343740395 0.0156229130 0.0174097906 0.0067164145 0.0061465131 0.0069234907 0.0123972543 0.0313430392
pec_48 pec_49 pec_50 pec_51 pec_52 pec_53 pec_54 pec_55 pec_56 pec_57 pec_58 pec_59
0.0173603900 0.0260202814 0.0556116195 0.0699605699 0.0576901226 0.0154313787 0.0125424197 0.0141669358 0.0232782853 0.0185153903 0.0104149366 0.0119663777
pec_60 pec_61 pec_62 pec_63 pec_64 pec_65 pec_66 pec_67 pec_68 pec_69 pec_70 pec_71
0.0180693435 0.0418654130 0.0801950440 0.0124100320 0.0101589735 0.0081179298 0.0054377203 0.0054112043 0.0059716033 0.0064917542 0.0029708978 0.0161584140
pec_72 pec_73 pec_74 pec_75 pec_76 pec_77 pec_78 pec_79 pec_80 pec_81 pec_82 pec_83
0.0169298940 0.0041864823 0.0096476878 0.0047093208 0.0059649995 0.0046060194 0.0099199141 0.0029295623 0.0025814292 0.0028573451 0.0029830054 0.0056214248
pec_84 pec_85 pec_86 pec_87 pec_88 pec_89 pec_90 pec_91 pec_92 pec_93 pec_94 pec_95
0.0048704332 0.0005914519 0.0010063448 0.0067050386 0.0058793785 0.0019469364 0.0007929125 0.0059452211 0.0023318256 0.0017673131 0.0002195631 0.0003543456
pec_96 pec_97 pec_98 pec_99 pec_100 fe_s ni_s fe_c ni_c n_fe n_ni
0.0002307600 0.0005581859 0.0016181635 0.0017680639 0.0003468896 0.0493928509 0.0306639906 0.0401905849 0.0338549970 0.0366818048 0.0311937743
data.frame(impurity=rf_model1$variable.importance) %>% arrange(desc(impurity))
NA
rf_model1 <- ranger(tmg ~ pec_28 + pec_27 + pec_32 + pec_62 + psd12_58 + psd12_50 + psd11_58 + pec_51 + psd11_50 + pec_29,data=train, importance = "impurity")
rf_model1
Ranger result
Call:
ranger(tmg ~ pec_28 + pec_27 + pec_32 + pec_62 + psd12_58 + psd12_50 + psd11_58 + pec_51 + psd11_50 + pec_29, data = train, importance = "impurity")
Type: Regression
Number of trees: 500
Sample size: 426
Number of independent variables: 10
Mtry: 3
Target node size: 5
Variable importance mode: impurity
Splitrule: variance
OOB prediction error (MSE): 0.001202208
R squared (OOB): 0.8132459
predictions1 <- predict(rf_model1,data = test, type='response')
predictions1$predictions %>% as.data.frame
mse<-function(act,pred) {mean((act- pred)^2)}
data.frame(pred=predictions1$predictions, act=test$tmg) %>% summarise(mse=mse(act,pred))
En vez de utilizar el promedio se utilzan los cuantiles para tener un intervalo de predicción (Meinshausen, 2006). A la hora de realizar el split, en vez de utilizar MSE o alguna otra metrica de impureza, se utiliza una metrica que tiene en cuenta a los cuantiles . Luego en cada hoja en vez de calcular el promedio, se calculan cuantiles.
rf_model1 <- ranger(tmg ~ . ,data=train, importance = "impurity",quantreg = TRUE)
rf_model1
Ranger result
Call:
ranger(tmg ~ ., data = train, importance = "impurity", quantreg = TRUE)
Type: Regression
Number of trees: 500
Sample size: 426
Number of independent variables: 647
Mtry: 25
Target node size: 5
Variable importance mode: impurity
Splitrule: variance
OOB prediction error (MSE): 0.001160973
R squared (OOB): 0.8196514
predictions1 <- predict(rf_model1,data = test, type= "quantiles")
predictions1$predictions %>% as.data.frame()
NA
NA
p1<-data.frame(predictions1$predictions,act=test$tmg,label="Mag prediction")
p1 %>% ggplot()+
geom_point(aes(x=act,y=quantile..0.5),color='red')+
geom_errorbar(aes(x=act,y=quantile..0.5,ymax=quantile..0.9,ymin=quantile..0.1),color='orange')+
theme_classic()
Similar al intervalo de prediccion. En las implementaciones de Random Forests, suele confudirse. En terminos generales, uno se aplica sobre una observacion/predicción en general, mientras que el otro trata sobre estadisticos. Un ejemplo seria, la diferencia entre desviacion estandar de una variable y el error estandar sobre un conjunto de muestras.
If we change the training dataset just a little bit, will Random Forest give you the same result for that particular example?
(Wagner et al. 2014) Basado en una tecnica que se llama jacknife.
rf_model1 <- ranger(tmg ~ ., data=train, importance = "impurity",keep.inbag = TRUE)
rf_model1
Ranger result
Call:
ranger(tmg ~ ., data = train, importance = "impurity", keep.inbag = TRUE)
Type: Regression
Number of trees: 500
Sample size: 426
Number of independent variables: 647
Mtry: 25
Target node size: 5
Variable importance mode: impurity
Splitrule: variance
OOB prediction error (MSE): 0.001156417
R squared (OOB): 0.8203593
predictions1 <- predict(rf_model1,data = train, type= "se")
predictions1$predictions %>% as.data.frame()
predictions1$se %>% as.data.frame()
data.frame(pred=predictions1$predictions, se= predictions1$se, act=train$tmg) %>% ggplot()+
geom_point(aes(x=act,y=pred),color='red')+
geom_errorbar(aes(x=act,y=pred,ymax=pred+se,ymin=pred-se),color='orange')+
theme_classic()